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Abstract 

We consider a general, neutral, dynamical model of biodiversity. Individuals have i.i.d. 
lifetime durations, which are not necessarily exponentially distributed, and each individual 
gives birth independently at constant rate A. Thus, the population size is a homogeneous, 
binary Crump-Mode- J agers process (which is not necessarily a Markov process). We as- 
sume that types are clonally inherited. 

We consider two classes of speciation models in this setting. In the immigration model, 
new individuals of an entirely new species singly enter the population at constant rate /i 
(e.g., from the mainland into the island). In the mutation model, each individual indepen- 
dently experiences point mutations in its germ line, at constant rate 9. 

We are interested in the species abundance distribution, i.e., in the numbers, denoted 
I n (k) in the immigration model and A n (k) in the mutation model, of species represented 
by k individuals, k — 1,2, ... ,n, when there are n individuals in the total population. 

In the immigration model, we prove that the numbers {It{k);k > 1) of species repre- 
sented by k individuals at time t, are independent Poisson variables with parameters as in 
Fisher's log-series. When conditioning on the total size of the population to equal n, this 
results in species abundance distributions given by Ewens' sampling formula. In particular, 
I n {k) converges as n — > oo to a Poisson r.v. with mean 7/fc, where 7 := /x/A. 

In the mutation model, as n — > 00, we obtain the almost sure convergence of n~ 1 A n (k) 
to a nonrandom explicit constant. In the case of a critical, linear birth-death process, 
this constant is given by Fisher's log-series, namely n~ 1 A n (k) converges to a k /k, where 
a:=A/(A + 0). 

In both models, the abundances of the most abundant species are briefly discussed. 



Running head. Neutral models of biodiversity with general lifetimes. 

Key words and phrases. Species abundance distribution - Crump-Mode-Jagers process - 
splitting tree - branching process - linear birth-death process - immigration - mutation - 
infinitely-many alleles model - Fisher logarithmic series - Ewens sampling formula - coalescent 
point process - scale function. 
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1 Introduction 



Our goal is to study two models of speciation in the vein of the neutral theory of biodiversity 
|14j . an immigration model and a mutation model, both in a same general birth/death dynam- 
ical setting. A specific feature of our results is that no assumption is made on the distribution 
of lifetime durations, contrasting with usual Markovian dynamics where this distribution is 
exponential. 

We assume that particles behave independently from one another, that each particle gives 
birth at constant rate A during its lifetime (interbirth durations are i.i.d. exponential ran- 
dom variables with parameter A), and that lifetime durations are i.i.d.. Then the process 
(■^tj t > 0) giving the number of extant individuals at time t, belongs to a wide class of branch- 
ing processes called Crump-Mode- J agers processes. Actually, the processes we consider are 
homogeneous (constant birth rate) and binary (one birth at a time) but differ in generality 
from classic birth-death processes in that the lifetimes durations may follow a general distri- 
bution. 

Now each individual bears some type (or, equivalently, belongs to some species), and we will 
assume that, at each birth time i, the type of the mother at time t is passed on to their offspring 
without modification. However, new species can arise in this population. These new types can 
arise in two fashions, whence defining either speciation model. 

aaaaa})bb cc 



Figure 1: The immigration model. Time axis is vertical; horizontal axis shows filiation. Solid 
dots show the arrival times of immigrants, who all have distinct types labelled by letters a, b, c. 
The type of each extant individual is also shown. 
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The immigration model is a generalization of Karlin and McGregor's model [18] to general 
lifetimes. It intends to model a population on an island receiving immigrants from the mainland, 
as in the theory of island biogeography [23]. We assume that new propagules singly enter the 
island population at the instants of a Poisson process with rate /z, called the immigration 
rate, and behave from then on, as the other particles on the island. Each of these immigrating 
particles is of an entirely new species, but their whole descendance is entirely clonal. See Figure 

m 

In the mutation model, we assume that the germ line of each particle experiences mutations 
during the whole lifetime of the particle. At the instants of a Poisson process with rate 6, the 
type of the particle changes to an entirely new type, as in the infinitely-many alleles model [9]. 
See Figure [2j 
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Figure 2: The mutation model. Time axis is vertical; horizontal axis shows filiation. Solid dots 
show the mutation events. Each mutation yields a new type, labelled by letters a, b, c, d. The 
type of each extant individual is also shown. 

Another way of seeing the model is to replace the word particle with the word colony, and 
the word population with the word metapopulation. Then in our model, all individuals of 
a colony are of the same species, lifetimes are extinction times of colonies, and birth events 
correspond to propagules sent out by a colony to found a brand new colony. Immigration 
events correspond to propagules immigrating from the mainland and founding simultaneously 
a brand new colony. Mutation events correspond to mutants appearing in a colony and getting 
to fixation instantaneously. This way of modeling speciation is more satisfactory, but we stick 
to the first terminology not to obscure reading. 
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2 Statements of results and Fisher's logarithmic series 



In \10\ 111]. R.A. Fisher and his coauthors suggested a simple model of species count where the 
probability of observing k individuals of a given species is ca k jk for some constant a € (0, 1). 
Following this, a number of authors proposed dynamical models where this so-called log-series 
not only gives the distribution of the number of individuals of a single species, but also the 
multivariate species abundance distribution of a community, in the sense that the number 
of species represented by k individuals follows independently a Poisson distribution with pa- 
rameter ca k /k. For example, Karlin and McGregor |18| studied various dynamical models of 
structured populations, including a critical birth-death process with immigration which is a 
particular case of our immigration model (i.e., where the lifespan is exponentially distributed), 
satisfying the previously described property. See also [191 [20], and [28J for a very nice and 
comprehensive account on these models and on their associated multivariate distributions. 

Let us fix some time t. In the immigration model (resp. in the mutation model), we let 
h{k) (resp. A t {k)) denote the number of species represented by k individuals at time t. When 
conditioning on the total number of individuals being n at this fixed time i, we will write 
It(k) instead of I n {k) and At(k) instead of A n (k). The vectors and (A(k))k are called 

frequency spectra. 

In the immigration model, we actually provide a rather accurate result (Theorem 14. 1 [) on 
the spectrum at any time i, without conditioning on the number of individuals, stating that 
the random variables {It{k))k are independent Poisson variables with parameters as in Fisher's 
log-series, with a parameter a depending on time t. In Corollary 14.21 we prove that the 
random vector (I n (l), . . . , I n (n)) has the same law as a vector of independent Poisson variables 
(Yi, . . . ,Y n ) conditioned on Ylk=i ^^ft = n ' wnere ^fc follows the Poisson distribution with 
parameter 7/ A;, 7 being defined as the immigration-to-birth rate ratio ///A. These two results 
are known in the case of a critical, linear birth-death process [18]. Notice that the conditioning 
in the corollary not only removes the dependence upon the origination time t, but also on 
the distribution of lifetime durations. This spectrum is exactly the one described by Ewens' 
sampling formula [H|8l[9]. The asymptotic behaviour of this spectrum is well-known (see for 
example 0(6]): for any fixed j, 

lim (J n (l), J„(2), . . . , I n {j)) = (Yi, Y 2 , . . . , Yj) 

n— >oo 

where the Y^s are independent Poisson variables with parameter j/k. 

This result contrasts with the mutation model, where species with abundance k are shown 
to accumulate linearly with population size, instead of stabilizing as previously. First, Theorem 
I5.1l gives the expected number of species with a fixed age and with abundance k. Then Theorem 
15.31 gives exact formulae for the almost-sure asymptotic accumulation of species with given 
abundances. In the case of a critical birth-death process with (birth/death rate A and) mutation 
rate 9, we get 

lim n A n (k) = c— — a.s., 

n— ¥00 k 

where a := A/(A + 9), and c = (1 — a)/a. We also have the a.s. convergence of the total 
number of species A n divided by n to — cln(l — a). 
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Thus, species with k individuals tend to accumulate linearly with sample size in the mu- 
tation model, while their cardinality converges to a finite random variable in the immigration 
model. This has an important consequence for the species with a large number of individuals. 
In the immigration model, it can be shown that the oldest j species on the island have a num- 
ber of individuals of the order of n, as n grows In the mutation model, in contrast, the 
proportion B n (k) of individuals belonging to species with more than k individuals is 

fc-i fc-i fc-i 

B n {k) = 1 - rT 1 ^jA n (j) — ► 1 - Y, caj = 1 " (1 " cOXX" 1 = a *~ 1 - 
j=l j=i j=i 

As a consequence, for any e > 0, there is an integer k such that limsup n B n (k) < e. Actually, 
independent calculations [4j show that the most abundant species have abundances of the order 
of n,P, with f3 = 1 — 9/rj, where r\ is the exponential growth rate of the total population, in the 
case when the mutation rate 6 is smaller than 77. In the case when 9 > 77, these abundances 
are of the order of log(n). 



3 Splitting trees and coalescent point processes 

The genealogical trees that we consider here are usually called splitting trees [T2]. Splitting 
trees are those random trees where individuals give birth at constant rate A during a lifetime 
with general distribution tt(-)/A, to i.i.d. copies of themselves, where ir is a positive measure 
on (0, 00] with total mass A called the lifespan measure. We assume that they are started with 
one unique progenitor born at time 0. We denote by P their law, and the subscript s in ¥ s 
means conditioning on the lifetime of the progenitor being s. Of course if P bears no subscript, 
this means that the lifetime of the progenitor follows the usual distribution 7r(-)/A. 

In [22], we have considered the so-called jumping chronological contour process (JCCP) of 
the splitting tree truncated up to height (time) t, which starts at min(s,t), where s is the death 
time of the progenitor, visits all existence times (smaller than t) of all individuals exactly once 
and terminates at 0. We have shown |22l Theorem 4.3] that the JCCP is a Markov process, 
more specifically, it is a compound Poisson process X with jump measure ir, compensated at 
rate —1, reflected below t, and killed upon hitting 0. We denote the law of X by P, to make 
the difference with the law P of the CMJ process. As seen previously, we record the lifetime 
duration, say s, of the progenitor, by writing P s for its conditional law on Xq = s. 

Let us be a little more specific about the JCCP. Recall that this process visits all existence 
times of all individuals of the truncated tree. For any individual of the tree, we denote by a its 
birth time and by to its death time. When the visit of an individual v with lifespan (a(v),to(v)} 
begins, the value of the JCCP is io(v). The JCCP then visits all the existence times of v's 
lifespan at constant speed —1. If v has no child, then this visit lasts exactly the lifespan of 
v; if v has at least one child, then the visit is interrupted each time a birth time of one of v's 
daughters, say w, is encountered (youngest child first since the visit started at the death level). 
At this point, the JCCP jumps from a{w) to to(w) A t and starts the visit of the existence 
times of w. Since the tree has finite length, the visit of v has to terminate: it does so at the 
chronological level a(v) and continues the exploration of the existence times of v's mother, at 
the height (time) where it had been interrupted. This procedure then goes on recursively as 
soon as is encountered (birth time of the progenitor). See Figure [3] for an example. 
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Figure 3: a) A realization of a splitting tree with finite extinction time. Horizontal axis has no 
interpretation, but horizontal arrows indicate filiation; vertical axis indicates real time; b) The 
associated jumping chronological contour process with jumps in solid line. 

Since the JCCP is Markovian (as seen earlier, it is a reflected, killed Levy process), its 
excursions between consecutive visits of points at height t are i.i.d. excursions of X. Observe 
in particular that the number of visits of t by X is exactly the number Nt of individuals alive 
at time t, where N is the aforementioned homogeneous, binary Crump-Mode-Jagers process. 
See Figure HI 

This property has two consequences, the first of which will be exploited in the immigration 
model, and the second one in the mutation model. 

The first consequence is the computation of the one-dimensional marginals of N. Let Ta 
denote the first hitting time of the set A by X. Conditional on the initial progenitor to have 
lived s units of time, we have 

P a (JVt = 0) = P S (T <T (t)+oo) ), (1) 

and, applying recursively the strong Markov property, 

¥ s (N t = k\N t ^0) = P t (T (t)+0Q) < To^Ptin < r (t)+oo) ). (2) 

Note that the subscript s in the last display is useless. 

The second consequence is that because X is (strongly) Markovian, the depths of the 
excursions of X away from t are i.i.d., distributed as some random variable H := t— info< s <r X s , 
where X is started at t and T denotes the first hitting time To A T( t)+00 ) of {0} U (t, +oo) by X. 
We record this by letting Hi denote the depth of the excursion between the i-th visit of t and 
its (i + l)-th visit, and stating that the variables H\, H2, ■ ■ ■ form a sequence of i.i.d. random 
variables distributed as H and killed at its first value greater than t. 
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Figure 4: Illustration of a splitting tree showing the durations H\, H2, H% elapsed since co- 
alescence for each of the three consecutive pairs (zi, X2), (X2, £3) and (£3, £4) of the Nt = 4 
individuals alive at time t. 

But in the splitting tree, Hi is also the coalescence time (or divergence time) between 
individual i and individual % + 1, that is, the time elapsed since the lineages of individual i and 
i + 1 have diverged. Further, it can actually be shown [22J that the coalescence time Ca+f. 
between individual i and individual i + k is given by 

Cj,;+fc = maxj-Hi+i, . . . , Hj+fc}, (3) 

so that the genealogical structure of the alive population of a splitting tree is entirely given 
by the knowledge of a sequence of independent random variables Hi, Ho., . . . that we will call 
branch lengths, all distributed as H. We call the whole sequence the coalescent point process. 

Here, exact formulae can be deduced for (pQ) and @ from the fact that the JCCP is a 
Levy process with no negative jumps. In particular, it can be convenient to handle its Laplace 
exponent i/j instead of its jump measure tt, that is, 



if>(a):=a- n(dx)(l - e~ ax ) a > 0. (4) 





We know |22j that the process is subcritical, critical or supercritical, according to whether 
m := /(o 00] r7r (^ r ) < 1) = 1 or > 1. In the latter case, the rate 77 at which (N t ;t > 0) 
grows exponentially on the event of non-extinction, called the Malthusian parameter, is the 
only nonzero root of the convex function ip. Furthermore, the probability of exit of an interval 
(from the bottom or from the top) by X has a simple expression (see e.g. [2]), in the form 

P , T/T \ W(t-s) 

p s (t < r (t)+oo) ) = w(t) ' ® 

where the so-called scale function W is the nonnegative, nondecreasing, differentiable function 
such that W(0) = 1, characterized by its Laplace transform 

00 1 

dxe~ ax W(x) = —— a > r,. (6) 

WW 
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As a consequence, the typical branch length H between two consecutive individuals alive at 
time t has the following distribution (conditional on there being at least two extant individuals 
at time t) 

I i 

F(H <s) = P*(T (t>+oo) < T s | T (t>+oo) < T ) = < s < t. (7) 

Let us stress that in some examples, ([6]) can be inverted. When ir has an exponential density, 
(Nt;t > 0) is a linear birth-death process with (birth rate A and) death rate, say p. If A ^ p, 
then (see [22J for example) 

W(x) = P x > 0, 

p- A 

whereas if A = p, 

W{x) = 1 + Ax x > 0. 

When 7r is a point mass at oo, (N t ;t > 0) is a pure-birth process, called Yule process, with 
birth rate A. Then (let p — > 0) 

W{x) = e Xx x> 0. 

In the case when A ^ p ^ 0, it had already been noticed by B. Rannala [25] that the coalescence 
times of a population whose genealogy is given by a (linear) birth-death process started (singly) 
t units of time ago and whose size is conditioned to be n, are identical to those of the order 
statistics of n i.i.d. random variables with density 

e t \ (1 -Po(s))(p- Xppjs)) 

f{s) = 7- 0<S<t, 

Po{t) 

where p is the death rate and 

where r := A — p. Now ([7|) applied to the expression of the scale function given previously for 
the birth-death case (A ^ p) agrees with the findings of B. Rannala under the form 

r 2 e rs Xe rt — p 
(Xe rs - pf ' e rt -l 



f{s) ds = F(H G ds) = 7^-^ j • — f ds < s < t. 



It is remarkable that in this case, exchanging A and p leaves the distribution of H unchanged. 
No extension of this fact is known in the general case. 
We end this section by the following lemma. 

Lemma 3.1 The one- dimensional marginal of Nt when the lifespan of the progenitor is random 
with law 7r(-)/A, is given by 

W'(t) 

and 

( 1 A fc_1 W'(t) 
V(N t = k) = 1 - -— t > 0. 



W{t) J \W{tf 
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Proof. From (pQ) and ([5|), we get 

W{t - s) 



,(N t = 0) 



W(t) 

and from ([2]) and ([5]), we get 



F s (N t = k | N t + 0) = 1 



1 X^ 1 1 



w(t)/ w(*)" 

Let us compute the unconditional law of Nt by integrating over s. First, 

KJV.-0)- f x-^) w{t - s) - F{t) 



where 

ft 



Now by Fubini-Tonelli, 



o v ' W{t) XW(t) 
F(t) := [ Tr{ds)W(t-s) t > 0. 



o 



oo rco i roo 



dtF(t)e- at = I n(ds) I dte~ at W{t- s) = I ■K{ds)e~ a 
o Jo Js VW Jo 

referring to ([6]), where we recall from (Jl]) that 

ijj(a) = a- Tr{dx){l - e' ax ) = a-\+ / ^{dx)e' ax a > 0. 

This yields 

A — a 



poo 

/ dtF(t)e~ at = 1 + 



This Laplace transform can be inverted as follows 

F(t) = XW(t) - W'(t) t > 0. 
Thus, we get the announced expression for Nt. □ 

4 The immigration model 

Assume that we start at time on the island with no individual at all. Let It denote the 
total number of extant individuals at time t. Let It(k) denote the number of species (each 
corresponding to a single progenitor immigrant) with k representative individuals at time t. In 
particular, 

k>l 

We allow k to equal 0, lt(0) corresponding to the number of effective immigrants having 
descendance at time t. Recall from the Preliminaries the scale function W. 
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Theorem 4.1 The random variables (it(0), it(l), • • •) are independent Poisson random vari- 
ables. For any k ^ 0, the r.v. It(k) is a Poisson r.v. with parameter 



where 7 := fi/X is the immigration-to-birth ratio. The Poisson r.v. It(0) has parameter 



Thanks to a standard result on independent Poisson random variables with respective 
means ca k /k, conditioning on ^ kX^ removes the dependence in a (see e.g. [28, p. 220]). It is 
then remarkable that conditioning the frequency spectrum on the total number of individuals 
removes the dependence in t. In the case of exponential lifetimes, this property has been re- 
discovered various times, see for example [24]. Here, the conditioning does not only remove the 
dependence in t, but also in A, W or ir, that is, in the whole dynamical scheme distribution. 

Corollary 4.2 Let Y\,Y<2,... be independent random variables, where Y^ follows the Pois- 
son distribution with parameter ~f/k. Conditional on the total number It of species at time t 
equalling n, the random vector {I t {\), . . . ,I t (n)), then also denoted (7 n (l), . . . , J n (n)), has the 
same law as (Yi, . . . , Y n ) conditioned by Ylk=i = n - 

Remark 1 This conditional spectrum is exactly the same one as that obtained in the King- 
man coalescent with mutations at rate 7 in the infinite- alleles model (i.e., the spectrum given 
by Ewens' sampling formula). In the case of exponential lifetimes, this coincidence between 
the binary branching process with immigration and the Moran process with mutations can be 
explained thanks to Hoppe's urn model (see fdjj). This observation has been recast in the neutral 
theory of biodiversity literature as a possible relaxation of the 'zero-sum assumption' |?| 

Remark 2 Theorem \4-l\ is concerned with species with fixed abundances k = 1, 2, . . ., i.e., the 
'small' families. It is also possible to get results for the abundances Pi,P2,... of the imm- 
migrant surviving families ranked by decreasing order of ages, i.e., the 'large' families, either 
as the population size n — >• 00 or as time t — > 00 in the supercritical case (mean number of 
offspring m > 1). M. Richard 126^ obtains that the vector (P\,P2, • • • ) rescaled by population 
size converges a.s. to the GEM distribution with parameter 7. 

Let us now prove the theorem. Let Mt be the number of immigrants having reached the island 
up until time t, and T\ < ■ ■ ■ < Tjy t < t the times of arrival of these immigrants. For any 
integer n, let a n denote an independent, random (uniform) permutation on {1, . . . ,n}. Then 
Mt is a Poisson r.v. with parameter fit, and conditional on Mt = n, the random variables 
(T a n\, . . . ,T a f n \) are i.i.d., uniformly distributed on [0,t\. Then we call Z t the number of 
descendants at time t of the particle having immigrated at time T a uy The random variables 

(zf\i = 1, . . . ,n) are i.i.d. distributed as some r.v. Zt which is the value of the Crump- 
Mode-Jagers process Nt started at a uniform time on [0, t] 



1 

where it will always be understood that A^o = 1. The following statement is the key result to 
the theorem. 




fit -7lnW(£). 
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Proposition 4.3 The law of Zt is given by the following two equations. 



nz t = k) = — i 



if i x 1 



Xkt V W(t) 
for k 7^ 0, whereas 

F(Z t = Q) = l-y t hiW(t). 

Before proving the proposition, we remind the reader of an elementary lemma on multinomial 
distributions with Poisson randomizing parameter. The theorem follows from this lemma and 
the proposition. 

Lemma 4.4 Letp := (po,p%, ■ ■ ■) be some probability distribution on the integers, let Xi,X2, ■ ■ ■ 
be i.i.d. r.v. with law p and let B be an independent Poisson r.v. with parameter f3. Finally, 
set 

B k :=#{i = l,...,B :Xi = k} k>0. 
Then the random variables Bq,B\,... are independent Poisson r.v., and B^ has parameter (3pk- 

Proof of the proposition. Thanks to Lemma |3.1| we have 

Wit] 

and 

/ 1 X^" 1 W'(t) 

Let us now turn to Zt, which has the law of Nt with origination time uniform on [0,t]. First, 

If 1 If 1 W'(u) 1 

P(Z t ^0=7/ du¥(N u ^0=7/ du—±-L = - \nW(t). 
t Jo t J XW(u) Xt 

Second, 

m = *, = i = *) = | /** (1 - ^ = ± (1 - ^) ' ■ 

which ends the proof of the proposition. □ 



5 The mutation model 

Recall from the section on splitting trees and coalescent point processes that the genealogy 
at a fixed time t of the N-t extant individuals of the splitting tree, originating from a single 
progenitor individual born at time 0, is characterized by the branch lengths Hi, i = 1, . . . Nt — 1, 
where Hi is the divergence time between individual i and individual i + 1. In addition, these 
r.v. are i.i.d. with common distribution 

I 1 

P(H <s) = ^ < s < t, 
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where the so-called scale function W depends on the birth rate A and on the lifespan measure 
7r, and is characterized by its Laplace transform. 

In the critical or supercritical cases, where W is unbounded, we can define the long-lived 
tree asymptotics, by letting t — > oo. This leads to 

and the stationary genealogy is then given by an infinite sequence of branches with i.i.d. lengths, 
with tail as in the last display. In the subcritical case, W has a finite limit equal to 1/(1 — m) 
(see [22]). Then conditioning on the population being still extant at time t and letting t — > oo, 
the quasi-stationary genealogy is given by a parameter m geometric number of branches with 
i.i.d. lengths distributed as follows 

P*(tf < s) = m- 1 (l - -Ly) s >0, 

where the star superscript serves to remind the conditioning. 

In this section, individuals experience mutations at rate 6 during their lifetime, and each 
mutation yields a brand new type. This assumption corresponds to what is usually called the 
infinitely-many alleles model. We now introduce the function We, which is the scale function 
associated to the so-called clonal process. More specifically, if one restricts the tree to points 
bearing the same type (e.g., the same type as the progenitor's type), then one retrieves a new 
splitting tree, whose birth rate remains equal to A and whose lifetime durations are distributed 
as a r.v. V 9 defined as the minimum of V and of an independent exponential variable with 
parameter 9 (i.e., the first mutation event). As in [21j . we can then define H e as the diver- 
gence time between consecutive individuals in the clonal splitting tree. In the (more general) 
coalescent point process, H e is defined as the divergence time between individual and the 
first individual whose type satisfies the following property: it is one of the successive types that 
appeared across time in the history of the lineage of individual 0. We have proved [21] that the 
function Wg (either defined as the scale function of the clonal splitting tree or equivalently, in 
the coalescent point process, as the inverse of the tail of H e ) satisfies 



W e (x) = 1+1 W (s)e~ es ds x>0. 
Jo 



(8) 



Now consider the standing population at time t conditioned on being nonempty, whose prob- 
ability law we denote by P*. For any real number y E (0, t), define At{k;dy) as the number 
of species originating in a point mutation having occurred during the time interval (y, y + dy) 
and represented by exactly k alive individuals at time t. The following proposition gives the 
expectation under P* of At(k;dy) and is extracted from [3]. 

Theorem 5.1 For any k > 1, the expected number of species of age in dy and abundance k is 

e -0y / i \ k ~ 1 

E k A t {k;dy) = 9dyW(t)— T 1 



W e {y) 2 \ W e (y). 

In [3], we provide arguments giving an intuition of this result. To be more specific, the last 
expression can be seen as the product of the three following terms : 

W(y) 
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which is the sum over i = 1, 2 ... of the probabilities that the z-th branch length has size Hi > y 
and (is the one that) carries a mutation with age in (y,y + dy), multiplied by 

W(y)e~ 0y 
W (y) 

which is the probability that the type carried by the lineage of the i-th. individual at time t — y 
has at least one alive representative, finally multiplied by 

1 / 1 - k -' 



W e {y) V Wg{y) 

which is the probability that the type carried by the lineage of the i-th. individual at time t — y 
has exactly k alive representatives, conditional on having at least 1. 

Recall that At denotes the number of species in the population at time t and that At(k) 
denotes the number of species represented by exactly k extant individuals. We can record the 
last theorem under its integral representation : 



Proposition 5.2 For any k > 1, 

E*A t (k) = W(t) [ dyOe 
Jo 



t i / i \ fc-i 

Qv 1 



W e (y) 2 V W e (y) 



and 

rt 

,-0y. 



E*A t = W(t) [ dyOe 
Jo 



We(y)' 



Furthermore, we got the following asymptotic result, extracted from [3] and [2T]. Here, A n (k) 
denotes the number of species with k individuals in the coalescent point process with population 
size n. Recall that coalescent point processes with different population sizes can be constructed 
on the same space by merely adding new independent branches. This allows us to state pathwise 
convergences for A n as n — > oo. 

Theorem 5.3 For all k > 1, the following convergence holds a.s., as n — > oo for the coalescent 
point process, and as t — > oo for the splitting tree in the supercritical case and on the event of 
non- extinction : 

f°° i / i \ k-l 

lim n- l AJk) = lim N7 l AAk) = / dy6e~ ey TIT . .„ 1 - -—- 

and 

f°° 1 
lim n~ x A n = lim AT 1 A t (k) = / dyOe^ — . 

n^oo n->oo J Q W$(y) 

Remark 3 The a.s. result for coalescent point processes relies on laws of large numbers (see 
I21\j). The a.s. result for splitting trees relies on the theory of random characteristics (see Jjjfl) 
introduced in the seminal paper ]15^ and further developed in \17$ and especially in J^j 
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Remark 4 As in the last section, one could ask about the behaviour of large families, as the 
number n of individuals grows. In contrast to the immigration case, here there are no families 
with abundances 0(n). Preliminary calculations ^ show that there are two possible regimes, 
depending on the respective positions of the mutation rate 9 and of the Malthusian parameter 
rj (see section on splitting trees). In the case when 9 < r\ the abundance of the largest family is 
of order 0(n^), where f3 = 1 — 9 /rj, otherwise it is of order 0(log(n)). 

As in the previous section, we have displayed results holding for a general lifespan measure 
7r. On the other hand, here the quantities displayed in the theorem can only be computed in 
the case of critical birth-death processes, that is, when the death rate of individuals is constant, 
equal to their birth rate A, so that W{x) = 1 + Xx. In that case, W' e {x) = Xe~ 6x , and we can 
integrate the quantities in the theorem. 

Corollary 5.4 In the case of a critical birth-death process with birth and death rate X, 

lim n 1 A n (k) = (a 1 — l)—r- a.s., 

n— >oo ' K 

where 

X 



In addition, 

lim n~ 1 A n = — (a -1 — 1) ln(l — a) a.s. 

n— >oo 
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